Extraction and Quantification of Pack-years and Classification of Smoker Information in Semi-structured Medical Records

نویسندگان

Lalindra De Silva

Thomas Ginter

Tyler Forbush

Neil Nokes

Brian Fay

Ted Mikuls

Scott DuVall

چکیده

Electronic medical records contain a wealth of information that is potentially invaluable to many interested parties. However, the fact that most of these documents are of semistructured nature and are comprised of fragmented English free text, region-specific templates and clinical sublanguage among many other things, has made it difficult to use existing Natural Language Processing tools on them directly and to extract those information. In this work, we focus our attention on a set of medical records pertaining to Rheumatoid Arthritis patients and we present a pattern-based methodology for extracting and quantifying pack-year information. We also introduce an extension to those patterns in classifying individual instances within these documents into a set of predefined smoker status classes. Since our effort in extracting pack-years from medical documents is the first in its kind to the best of our knowledge, we evaluate our approach on a manually selected document collection and present very promising results. We also evaluate our instance classification approach using an additional document collection. Appearing in Proceedings of the 28 th International Conference on Machine Learning, Bellevue, WA, USA, 2011. Copyright 2011 by the author(s)/owner(s).

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Automatic Extraction of Semantic Content from Medical Discharge Records

Semi-structured medical texts like discharge summaries are rich sources of information that can exploit the research results of physicians by performing statistical analysis of similar cases. In this paper we introduce a system based on Machine Learning algorithms that successfully classifies discharge records according to the smoking status of the patient (we distinguish between current smoker...

متن کامل

A New Method for Improving Computational Cost of Open Information Extraction Systems Using Log-Linear Model

Information extraction (IE) is a process of automatically providing a structured representation from an unstructured or semi-structured text. It is a long-standing challenge in natural language processing (NLP) which has been intensified by the increased volume of information and heterogeneity, and non-structured form of it. One of the core information extraction tasks is relation extraction wh...

متن کامل

Presenting a method for extracting structured domain-dependent information from Farsi Web pages

Extracting structured information about entities from web texts is an important task in web mining, natural language processing, and information extraction. Information extraction is useful in many applications including search engines, question-answering systems, recommender systems, machine translation, etc. An information extraction system aims to identify the entities from the text and extr...

متن کامل

Comparison of Three Information Sources for Smoking Information in Electronic Health Records

OBJECTIVE The primary aim was to compare independent and joint performance of retrieving smoking status through different sources, including narrative text processed by natural language processing (NLP), patient-provided information (PPI), and diagnosis codes (ie, International Classification of Diseases, Ninth Revision [ICD-9]). We also compared the performance of retrieving smoking strength i...

متن کامل

Smoking Pattern in Family Members of Smokers in Slums of Surat City, Western India

Background: The relationship between becoming a smoker and having smoker parents, siblings, and relatives is still uncovered in India. The influences of a smoking role model in a family on smoking habits of individuals are yet to be revealed. This study aimed to understand the relationship of smoking abuse of a person with smoking of their family members. Methods: This community-based cross-sec...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2011

Extraction and Quantification of Pack-years and Classification of Smoker Information in Semi-structured Medical Records

نویسندگان

چکیده

منابع مشابه

Automatic Extraction of Semantic Content from Medical Discharge Records

A New Method for Improving Computational Cost of Open Information Extraction Systems Using Log-Linear Model

Presenting a method for extracting structured domain-dependent information from Farsi Web pages

Comparison of Three Information Sources for Smoking Information in Electronic Health Records

Smoking Pattern in Family Members of Smokers in Slums of Surat City, Western India

عنوان ژورنال:

اشتراک گذاری